updated BERT processing to current interface #1223

piotrpiekos · 2020-11-15T18:34:30Z

Adapts old code to new interface. For now it's just fine-tuning bert of chosen glue dataset without evaluating on test set

I didn't use SubwordTextEncoder because it gives different token_ids on words that consist of more than one token (e.g. "neglecting").
I didn't upgrade tfds_stream to data_stream, because it might be necessary in BERT models to be able to do processing after batching, so this is to be revisited after implementing more BERT family models.

afrozenator · 2020-11-18T22:51:47Z

Hi @piotrekp1 - Thanks for adding this! I'll accept it right now without changes, but in a future PR could you move it to trax/data instead and add a simple unit test there? (That package should have examples tests with testdata that will make this easy).

afrozenator · 2020-11-18T22:52:01Z

@henrykmichalewski as FYI

piotrpiekos · 2020-11-19T05:41:57Z

Hi @afrozenator. I'll do that, thank you!

updated BERT processing to current interface

6a1d286

google-cla bot added the cla: yes label Nov 15, 2020

afrozenator added the ready to pull Added when the PR is ready to be merged. label Nov 16, 2020

copybara-service bot merged commit be84553 into google:master Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated BERT processing to current interface #1223

updated BERT processing to current interface #1223

piotrpiekos commented Nov 15, 2020

afrozenator commented Nov 18, 2020

afrozenator commented Nov 18, 2020

piotrpiekos commented Nov 19, 2020

updated BERT processing to current interface #1223

updated BERT processing to current interface #1223

Conversation

piotrpiekos commented Nov 15, 2020

afrozenator commented Nov 18, 2020

afrozenator commented Nov 18, 2020

piotrpiekos commented Nov 19, 2020